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Nucleosome organization in eukaryotic genomes has a deep impact on gene function. Although 
i— i . progress has been recently made in the identification of various concurring factors influencing nucleosome 

' positioning, it is still unclear whether nucleosome positions are sequence dictated or determined by a 

r H\ i random process. It has been postulated for a long time that, in the proximity of TSS, a barrier determines 

. the position of the +1 nucleosome and then geometric constraints alter the random positioning process 

O ' determining nucleosomal phasing. Such a pattern fades out as one moves away from the barrier to become 

* ^ , again a random positioning process. Although this statistical model is widely accepted, the molecular 

I ■ nature of the barrier is still unknown. Moreover, we are far from the identification of a set of sequence 

rules able: to account for the genome-wide nucleosome organization; to explain the nature of the barriers 
on which the statistical mechanism hinges; to allow for a smooth transition from sequence-dictated to 
I ■ statistical positioning and back. Here we show that sequence complexity, quantified via various methods, 

| can be the rule able to at least partially account for all the above. In particular, we have conducted our 

. analyses on four high resolution nucleosomal maps of the model eukaryotes S.cerevisiae, C. elegans and 

t-H ' D.melanogaster, and found that nucleosome depleted regions can be well distinguished from nucleosome 

enriched regions by sequence complexity measures. In particular, the depleted regions are less complex 
• than the enriched ones. Moreover, around TSS, complexity measures alone are in striking agreement 

' with in vivo nucleosome occupancy, in particular precisely indicating the positions of the +1 and -1 

f*"*') , nucleosomes. Those findings indicate that the intrinsic richness of subsequences within sequences plays a 

£SJ ' role in nucleosomal formation in genomes, and that sequence complexity constitutes the molecular nature 

7—i ' of nucleosome barrier. 

> : 

^ ; 1 Background 

H ■ 

It is well known that chromatin organization in Eukaryotic genomes has a deep impact on gene regulation 
and function, e.g., [TJ. Therefore, it comes as no surprise that a substantial amount of research has been 
devoted to this fascinating topic, with particular focus on the identification of nucleosome positions within 
the genomes of model organisms and of mechanisms influencing that positioning. Although quite a bit 
of progress has been made on the identification of various concurring factors that influence nucleosome 
positioning [10], an answer to the question raised by Kornberg in 1981 [4] as to which extend nuclesome 
positions are "sequence dictated" or determined by a random process has remained elusive. As a matter 
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of fact, it has been the object of intense debate, in particular in view of a result by Widom et al. [9] 
claiming that there exists a genomic code for nucleosome positioning. In mathematical terms, a code 
is a very specific and constrained object, even when it is degenarate as the genetic code. Therefore, it 
seems that a verbatim use of the term code in the context of chromatin studies would be quite misleading 
and actually very easy to challenge, e.g., [Ill I12| . Indeed, the focus has shifted from sequence dictating 
to sequence influencing chromatin organization, a finding that is much less amenable to challenge and 
dismissal [3J. In contrast to the "sequence dictating- influencing" debate just outlined, the statistical 
positioning mechanism has had very little challenge. It is worth recalling that it is based on two main 
ingredients: the existence of a barrier and of a statistical law governing the positioning of the nucleosomes 
[5]. Generalizing the just mentioned earlier results by Romberg and Stryer, Mobius and Gerland [8] have 
shown that such a mechanism can be described and quantified by the Tonks model of statistical physics. 
In particular, in the proximity of TSS [6l|8], the barrier determines the position of the +1 nucleosome and 
then geometric constraints alter the random positioning process determining the well known nucleosomal 
pattern in DNA. Such a pattern fades out as one moves away from the barrier to become again a random 
positioning process. An analogous behaviour is followed by the -1 nucleosome. 

2 Statement of Results 

All of the above studies leave open several questions. Indeed, given the poor results in finding a consistent 
and concise set of "sequence-rules" , e.g., [Ill I12| . that explain how the sequence influences nucleosome 
positioning, it is a challenging problem to show that such a set of rules exists. Analogously, although 
the statistical model is widely accepted [101 112| , the molecular nature of the barrier is unknown [12] . 
Moreover, the interplay between sequence-dictated vs statistical positioning has been hardly explored, 
although it is to be expected that the true cellular state is probably a combination of both machanisms 
[2]. A study by Mavrich et al. [6], performed on DNA regions around 4799 TSS in Yeast, addresses in 
part this latter problem indicating that the sequence dictates the positioning of the so-called +1 and -1 
nucleosomes, leaving the rest to the statistical positioning mechanism. In that study, an effort is also 
made to identify some of the compositional properties of the nucleosome-free region (NFR) responsible for 
the formation of the barriers. However, it is also pointed out that the identified sequence biases, mostly 
for dinucleotides, may be special cases of more general, and yet unknown, properties of the genomic 
sequences up and downstram of a TSS. In a nutshell, we are far from the identification of a concise set 
of sequence rules able: (a) to account for the genome-wide nucleosome organization in a genome; (b) to 
explaing the nature of the barriers on which the statistical mechanism hinges; (c) to allow for a smooth 
transition from sequence-dictated to statistical positioning and back. 

As said before, a code for nucleosome positioning seems to be too stringent. Fortunately, there are 
other intrinsic properties of sequences, in particular the ones measured by complexity measures, that 
may play a role in influencing nucleosomal organization in a eukaryotic genome. It is worth recalling 
that complexity measures usually quantify the "intrinsic richness" of distinct subsequences within a given 
sequence. Here we study whether sequence complexity, quantified via various methods, can be the rule 
able to at least partially account for all (a)-(c) above. Towards this end, we have the following results. 

(1) We have conducted extensive studies on four high resolution nucleosomal maps of three model 
orgamisms, i.e., Yeast, C. Elegans and Drosophila Melanogaster [3J \7\ Q2], to obtain the following 
results. Nucleosome depleted regions in each map can be well distinguished from nuclesome enriched 
regions in each map by sequence complexity measures. In particular, the depleted regions are less 
complex than the enriched ones. Such a finding indicates that the intrinsic richness of subsequences 
within sequences plays a role in influencing nucleosomal formation in genomes, addressing point (a) 
above. 

(2) We have applied our methodology to the same TSS dataset of Mavrich et al. [B] , and also accounted 
for the additional insights by Mobius and Gerland [8]. We find that the NFR is characterized by 
an area of lower complexity with respect to the two regions flanking it, with the TSS being placed 
in the proximity of the absolute minimum of each complexity curve we have computed. The two 
barriers are towards the local maxima at the left and right of the TSS. Particularly striking is the 
complexity curve obtained with linguistic complexity, based on a dictionary of words of length up to 
seventeen (therefore much larger than the one considered by Mavrich et al.). Indeed, the positions 
of the +1 and -1 nucleosomes, respectively, are in the proximity of the first maximum to the right 
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and to the left, respectively, of the TSS on the complexity curve. In view of point (b) above, such a 
finding suggests, at least as far as TSS are concerned, that the combinatorial properties of the NFR 
subsequence are strongly associated with the creation of the barriers, highlighting that its nature 
is at least in part combinatorial, i.e., dictated by intrinsic properties of sequences. 

(3) Since the complexity of subsequences within a sequence can be modulated, points (1) and (2) above 
suggest that, at least for TSS, such a modulation can smoothly accomodate for both sequence- 
dictated and statistical positioning, in a continuum that requires no other particular arrangement 
for switching between the two "states" of interest. Indeed, the low complexity area characterizing 
the NFR indicated where nucleosomes should not be, giving also an indication that the +1 and -1 
nucleosomes must be placed towards the local maxima of the complexity curve at the end of the 
NFR. Such an interpretation sheds some light on point (c) above. 
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